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We consider the state complexity of basic operations on tree languages recognized by deterministic 
unranked tree automata. For the operations of union and intersection the upper and lower bounds of 
both weakly and strongly deterministic tree automata are obtained. For tree concatenation we estab- 
lish a tight upper bound that is of a different order than the known state complexity of concatenation 
of regular string languages. We show that («+ l)((m+ 1)2" — 2" _1 ) — 1 vertical states are suffi- 
cient, and necessary in the worst case, to recognize the concatenation of tree languages recognized 
by (strongly or weakly) deterministic automata with, respectively, m and n vertical states. 
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1 Introduction 

As XML |H1 has played increasingly important roles in data representation and exchange through the 
web, tree automata have gained renewed interest, particularly tree automata operating on unranked trees. 
XML documents can be abstracted as unranked trees, which makes unranked tree automata a natural and 
fundamental model for various XML processing tasks []2l[9j[T3l. Both deterministic and nondeterministic 
unranked tree automata have been studied. 

One method to handle unranked trees is to encode them as ranked trees and then use the classical 
theory of ranked tree automata. However, the encoding may result in trees of unbounded height since 
there is no a priori restriction on the number of the children of a node in unranked trees. Also depending 
on various applications, it may be difficult to come up with a proper choice of the encoding method. 

Descriptional complexity of finite automata and related structures has been extensively studied in 
recent years |6j [141 LTSl - Here we consider operational state complexity of deterministic unranked 
tree automata. Operational state complexity describes how the size of an automaton varies under regu- 
larity preserving operations. The corresponding results for string languages are well known ll8l [T4l[T6Tl , 
however, very few results have been obtained for tree automata. While state complexity results for tree 
automata operating on ranked trees are often similar to corresponding results on regular string automata 
fPfl . the situation becomes essentially different for automata operating on unranked trees. An unranked 
tree automaton has two different types of states, called horizontal and vertical states, respectively. There 
are also other automaton models that can be used to process unranked trees, such as nested word automata 
and stepwise tree automata. The state complexity of these models has been studied in Bl fTOlfTTI . 

We study two different models of determinism for unranked tree automata. We call the usual deter- 
ministic unranked tree automaton [2] model where the horizontal languages defining the transitions are 
specified by DFAs (deterministic finite automata), a weakly deterministic tree automaton (or WDTA). 
For the other variant of determinism for unranked tree automata, we refer to the corresponding automa- 
ton model as a strongly deterministic unranked tree automaton (or SDTA). This model was introduced by 
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Cristau, Loding and Thomas (3), see also Raeymaekers and Bruynooghe [12]. SDTAs can be minimized 
efficiently and the minimal automaton is unique [3]. On the other hand, the minimization problem for 
WDTAs is NP-complete and the minimal automaton need not be unique iPTOl . 

We give upper and lower bounds for the numbers of both vertical and horizontal states for the op- 
erations of union and intersection. The upper bounds for vertical states are tight for both SDTAs and 
WDTAs. We also get upper bounds which are almost tight for the number of the horizontal states of 
SDTAs. Obtaining a matching lower bound for the horizontal states of WDTAs turns out to be very 
problematic. This is mainly because the minimal WDTA may not be unique and the minimization of 
WDTAs is intractable iflOl . Also, the number of horizontal states of WDTAs can be reduced by adding 
vertical states, i.e., there can be trade-offs between the numbers of horizontal and vertical states, respec- 
tively. 

The upper bounds for the number of vertical states for union and intersection of WDTAs and SDTAs 
are, as expected, similar to the upper bound for the corresponding operation on ordinary string automata. 
Already in the case of union and intersection, the upper bounds for the numbers horizontal states are 
dramatically different for WDTAs and SDTAs, respectively. In an SDTA, the horizontal language as- 
sociated with label a is represented with a single DFA H a augmented with an output function X. The 
state assigned to a node labeled with a is determined by the final state reached in H a and A. On the 
other hand, in a WDTA, the horizontal languages associated with a given label a and different states are 
represented by distinct DFAs. The state assigned to a node labeled with a depends on the choice of the 
DFA. 

We consider also the state complexity of (tree) concatenation of SDTAs. It is well known that 
m2" — 2 n ~ Y states are sufficient to accept the concatenation of an m state DFA and an n state DFA |[T6l . 
However, the tight upper bound to accept the concatenation of unranked tree automata, with m and n 
vertical states respectively, turns out to be (n+ l)((m+ 1)2" — 2" _1 ) — 1. The factor (« + 1) is necessary 
here because the automaton accepting the concatenation of two tree languages must keep track of the 
computations where no concatenation has been done. For string concatenation, there is only one path 
and the concatenation always takes place somewhere on that path. For non-unary trees, there is no way 
that the automaton can foretell on which branch the concatenation is done and, consequently, the automa- 
ton for concatenation needs considerably more states. It should be emphasized that this phenomenon is 
not caused by any particular construction used for the automaton to accept the concatenation of given 
tree languages, and we have a matching lower bound result. 

Since complementation is an "easy" operation for both strongly and weakly deterministic tree au- 
tomata, we do not investigate its state complexity in this paper. Note that we do not require the automaton 
models to be complete (i.e., some transitions may be undefined). A (strongly or weakly) deterministic 
automaton accepting the complement of a tree language recognized by the same type of automaton would 
need at most one additional vertical state and it is easy to see that this bound can be reached in the worst 
case. 

The paper is organized as follows. Definitions of unranked tree automata and other notations are 
given in section [2 The upper bounds and corresponding lower bounds for union and intersection of 
SDTAs are presented in section 13.11 In section 13.21 the state complexity of union and intersection of 
WDTAs is discussed. The tight bound for the number of vertical states for tree concatenation of SDTAs 
is given in section [4] The same construction works for WDTAs. 
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2 Preliminaries 

Here we briefly recall some notations and definitions concerning trees and tree automata. A general 
reference on tree automata is (2J. 

Let IN be the set of non-negative integers. A tree domain D is a finite set of elements in IN* with the 
following two properties: (i) If w G D and u is a prefix of w then u G D. (ii) If ui G D, i G IN and j < i 
then uj G D. The nodes in an unranked tree t can be denoted by a tree domain dom(t), and t is a mapping 
from dom(t) to the set of labels E. The set of E-labeled trees is T^. 

For G 7x and u G dom(t'), t'(u <— t) denotes the tree obtained from t' by replacing the subtree 
at node u by t. The concatenation of trees t and t' is defined as t ■ t' = {t'(u <— t) | m G leaf{t')}. The 
concatenation operation is extended in the natural way to sets of trees L\, L2: 

L!-L 2 = |J 

We denote a tree f = b{a\, . . . ,a n ), whose root is labeled by b and leaves are labeled by a\,... ,a n , 
simply as b(a\ . . .a n ). When a\ = . . . = a„ = a, write t = b(a n ). By a slight abuse of notation, for a 
unary tree t = a\ (a^O • • • • •))> we write t = a\ai- ■ -a n for abbreviation. When a\ = . . . = a„ = a, we 
write t = a" for short. (In each case it should be clear from the context whether a n refers to a sequence 
of leaves or to a unary tree.) 

Next we briefly recall the definitions of the two variants of deterministic bottom-up tree automata 
considered here. A weakly deterministic unranked tree automaton (WDTA) is a 4-tuple A = (Q,L, S,F) 
where Q is a finite set of states, £ is the alphabet, F C Q is the set of final states, 8 is a mapping from 
Q x E to the subsets of (QUL)* which satisfies the condition that, for each q G Q,(J G E, 8(q,o) is 
a regular language and for each label a and every two states q\ ^ qi, 8(qi,a)f]5(q2,o) = 0. The 
language 8(q, a) is called the horizontal language associated with q and a and it is specified by a DFA 

H q,a- 

Roughly speaking, a WDTA operates as follows. If A has reached the children of a a-labelled node 
// in states q\, q2 q n , the computation assigns state q to node u provided that qiq2--.q n G 8(q,cr). In 
the sequence q\q2-..q n an element q, G E is interpreted to correspond to a leaf labeled by that symbol. A 
WDTA is a deterministic hedge automaton [2] where each horizontal language is specified using a DFA. 

Note that in the usual definition of [2 J the horizontal languages are subsets of Q*. In order to simplify 
some constructions, we allow also the use of symbols of the alphabet E in the horizontal languages, where 
a symbol a G E occurring in a word of a horizontal language is always interpreted to label a leaf of the 
tree. The convention does not change the state complexity bounds in any significant way because we use 
small constant size alphabets and we can think that the tree automaton assigns to each leaf labeled by 
a G E a particular state that is not used anywhere else in the computation. 

A strongly deterministic unranked tree automaton (SDTA) is a 4-tuple A = (Q,1Z,F, 8), where Q,T,,F 
are similarly defined as for WDTAs. For each a G E, the horizontal languages 8(q,a), q G Q, are de- 
fined by a single DFA augmented with an output function as follows. For a G E define D a = (S a ,QU 
L,s° a ,Y a ,E a ,X a ) where (S a ,QWL,s%y a ,E a ) is a DFA and X a is a mapping S a -> Q. For all q G Q and 
a G E, the horizontal language 8(q,a) is specified by D a as the set {w G (QUE)* | A fl ()£(s[|, w)) = q}. 
Intuitively, when A has reached the children of a node u labelled by a in states q\,...,q m (an element 
qi G E is interpreted as a label of a leaf node), the state at u is determined (via the function X a ) by the 
state that the DFA D a reaches after reading the word q\ ■ ■ ■ q m . More information on SDTAs can be found 
in 0. 
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Given a tree automaton A = (Q,L,F, 8), the states in Q are called vertical states. The DFAs recog- 
nizing the horizontal languages are called horizontal DFAs and their states are called horizontal states. 
We define the (state) size of A, size (A), as a pair of integers [[<2|,«], where n is the sum of the sizes of all 
horizontal DFAs associated with A. 

3 Union and intersection 

We investigate the state complexity of union and intersection operations on unranked tree automata. The 
upper bounds on the numbers of vertical states are similar for SDTAs and WDTAs, however the upper 
bounds on the numbers of horizontal states differ between the two models. 

3.1 Strongly deterministic tree automata 

The following result gives the upper bounds and the lower bounds for the operations of union and inter- 
section for SDTAs. 

Theorem 3.1 For any two arbitrary SDTAs A,- = (£?,-,£, 8i,Fi), i = 1,2, whose transition function asso- 
ciated with a is represented by a DFA H^' = (C l a , Qi U L, f a , c' a , E' a ), we have 

1 Any SDTA Bij recognizing L(A\) UL(A2) satisfies that 

size(Bu) < [ + 1) x {\Q 2 \ + 1) - 1; £ ((|C* | + 1) x (\C 2 a \ + 1) - 1) ]. 

creE 

2 Any SDTA B n recognizing L(A\) PlL(A2) satisfies that 

size(B n )<[|Gi|x|G2|; £ l4|x|c*|]. 

oeT. 

3 For integers m,n> 1 and relatively prime numbers k\ , &2? • ■ • i km > ^m+l : • • ■ j 

km+n, there exists tree languages T\ and T2 such that T\ and T2, respectively, can be recognized by 
SDTAs with m and n vertical states, YYiLi h + 0(m) and IlS+m^' + 0( n ) horizontal states, and 

i any SDTA recognizing T\ U T2 has at least (m + 1 ) (n + 1 ) — 1 vertical states and Y[T=" k hori- 

zontal states. 

ii any SDTA recognizing T\ n T2 has at least mn vertical states and FI^i horizontal states. 

The upper bounds on vertical and horizontal states are obtained from product constructions, and 
Theorem 13.11 shows that for the operations of union and intersection on SDTAs the upper bounds are 
tight for vertical states and almost tight for horizontal states. 

3.2 Weakly deterministic automata 

In this section, the upper bounds on the numbers of vertical and horizontal states for the operations 
of union and intersection on WDTAs are investigated, and followed by matching lower bounds on the 
numbers of vertical states. 

Lemma 3.1 Given two WDTAs A ; = (Qj,L,8i,Fj), i = 1,2, each horizontal language 8i(q,o) is repre- 
sented by a DFAD^ia = (C^&UE^,^,^). 
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The language L(A\) UL(A2) can be recognized by a WDTA B u with 

size(Su) < [ (|Qi| + 1) x (|(2 2 | + 1)- 1; 

xn, ee , 1^1)] 

The language L(A\) RL(A2) can be recognized by a WDTA B n with 

size(fi n ) < [ x \Q 2 \; |E| x £ qeQupeQ2 \D% a \ X |D^ ff | ]. 
The theorem below shows that the upper bounds for the vertical states are tight. 

Theorem 3.2 For any two WDTAs A\ andA 2 with m and n vertical states respectively, we have 

1 any WDTA recognizing L{A\) UL(A2) needs at most (m + 1)(« + 1) — 1 vertical states, 

2 any WDTA recognizing L{A\) HL(A2) needs at most mn vertical states, 

3 for any integers m,n> 1, there exist tree languages T\ and T2 such that T\ and T2 can be recognized 
by WDTAs with m and n vertical states respectively, and any WDTA recognizing T\ U T2 has at least 
(m+l)(«+l) — 1 vertical states, and any WDTA recognizing T\ n T 2 has at least mn vertical states. 

Open problem 1 Are the upper bounds for the numbers of horizontal states given in Lemma [PI tight ? 

In the case of WDTAs we do not have a general method to establish lower bounds on the number of the 
horizontal states. It remains an open question to give (reasonably) tight lower bounds on the number 
of horizontal states needed to recognize the union or intersection of tree languages recognized by two 
WDTA's. 

4 Concatenation of strongly deterministic tree automata 

We begin by giving a construction of an SDTA recognizing the concatenation of two tree languages 
recognized by given SDTAs. 

Lemma 4.1 Let A\ and A 2 be two arbitrary SDTAs. A,- = (Qj,L,8i,Fi), i = 1,2, transition function for 
each celij represented by a DFA H^' = (C' a , Qj U £, Y a , c l a , E' a ) with an output function X' a . 
The language L(A 2 ) ■ L{A\) can be recognized by an SDTA B with 

size(fi) < [ I + 1) x (2^1 x (\Q 2 \ + 1) - 2laM) - 1; |I|(|C 2 | + 1)(|C' | + 1) x 2^ +l }. 

Proof. Choose B = (Q\ x Q'[ x Q' 2 ,L,8,F), where Q[ = Qi U {dead}, Q'[ = 0>(Q{), Q' 2 = Q 2 U {dead}. 
Let P 2 C <2i. (p u P 2 ,q) G Q[ x Q" x Q 2 is final if there exists p G P 2 such that p^F\. 

The transition function 8 associated with each a is represented by a DFA H§ = (5 x S" x S 1 , {Q\ x 
Q'[ x 2^1:^,(4,0, ({ c ff,o}>°)> c ff,o)' y ) with an out P ut function X§, where S = C x c U {dead}, S" = 
^(C^)x{0,l}, S' = C 2 a U{dead}. Let C 2 C C l a , x = 1,0. (ci , (C 2 ,x),c 2 ) G S x S" x S' is final if c 2 G E 2 a 
or there exists c G c\ U C 2 such that c G .E^. /x is defined as below: 

For any input aGE, 

jU((ci,(C 2 ,x),c 2 ),a) = (yi(c b a),( |J 7^(c 2 ,a),x), y 2 (c 2 ,a)) 

C2GC2 
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For any input ( Pu P 2 ,q) G Q\ x 2'/ x Q 2 , if P 2 + 0, 

At((c 1 ,(C 2 ,0),c 2 ),(p 1 , J P 2 ^)) = (7^(c 1 ,/ 7l ),( |J 7^(c 1 ,p 2 ),l),^(c 2 ,< ? )) 

M((ci,(C 2 ,l),c 2 ),(pi ) P2,?)) = (7a(ci, J Pl),( U 7a(ci ;j P2)U |J £(c2,pi), l),£(c 2 ,q)) 

PiePi c 2 eC2 

if P 2 = 0, 

At((c 1 ,(C 2 ,O),c 2 ),(^ 1 ,0^)) = (7^(c 1 ,p 1 ),(0,O),^(c 2 ^)) 

iA((c l ,(c 2 ,i),c 2 ),( Pl ,<i>,q)) = (yUci,Pi),( U 7a(c 2 ^i),i),^(c 2 ,<?)) 

c 2 eC 2 

Write the computation above in an abbreviated form as ju((ci , (C 2 ,x),c 2 ),r) = {p^.P^q'), r E T.\JQ\ x 
2" x <2 2 . When compute //j and g', if any f a {c, a), i : = 1 , 2, c = c\ , c 2 , a € £ U 2,-, is not defined in A,, 
assign <iea<i to /?j or g'. When compute P' v add nothing to P' 2 if any 7cy(c, a) is not defined. 

Let pi ea f € Qi denote the state assigned to the leaf in Ai substituted by a tree in L(A 2 ). A,f is defined 
as: for any final state e = (ci, (C 2 ,x),c 2 ), xi = ci flis^ ^ 2 = C 2 C\E^, 



■ (X^'PieafUU^Kfa),^)), if A 2 (c 2 ) €F 2 andx = 1 
(A^(^i),p /efl/ ,A 2 (c 2 )), if A 2 (c 2 ) €f 2 andx = 
(^(xi),U, 2 ex 2 ^(x 2 ),A 2 (c 2 )), if A 2 (c 2 ) £F 2 andx = 1 

, (A^(^),0,A 2 (c 2 )), ifA 2 (c 2 )^F 2 andx = 



J (A^(xi),0,Jea<i) if x = 
1 (^ff^OiU^eXz^a^),^^) ifx = 1 

If xi = 0, define A^ (xi ) = dead. If X 2 = 0, define Ut 2 eX 2 ^a( x 2) = 0- 

The state in B has three components (p\,P 2 ,q). p\ is used to keep track of Ai's computation where 
no concatenation is done. p\ is computed by the first component c\ in the state of H%. P 2 traces the 
computation where the concatenation takes place. In a state (c\ , (C 2 ,x),c 2 ) of x = 1 (or x = 0) 
records there is (or is not) a concatenation in the computation. The third component q keeps track of the 
computation of A 2 . When a final state is reached in A 2 , which means a concatenation might take place, 
an initial state pi ea f is added to P 2 , which is achieved by the function in B. 

According to the definition of X§, when A^(c 2 ) G F 2 , pi ea f is always in the second component of the 
state. Exclude the cases when A^(c 2 ) 6 F 2 , and p/ ea / is not in the second component of the state, and we 
do not require B be complete. B has + 1) x (2l gl x (\Q 2 \ + 1) — 2^ Ql ^ 1 ) — 1 vertical states in worst 
case. ■ 

Lemma |4~T1 gives an upper bound on both the numbers of vertical and horizontal states recognizing 
the concatenation of L(A 2 ) and L{A\). In the following we give a matching lower bound for the number 
of vertical states of any SDTA recognizing L(A 2 ) -L(Ai). 

For our lower bound construction we define tree languages consisting of trees where, roughly speak- 
ing, each branch belongs to the worst-case languages used for string concatenation in |fT6l and, further- 
more, the minimal DFA reaches the same state at an arbitrary node u in computations stalling from any 



1 If c 2 G£ 2 

2 Ifc 2 i El, 
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(a) DFA A 



(b) DFA B 



Figure 1: DFA A and B 



two leaves below u. For technical reasons, all leaves of the trees are labeled by a fixed symbol and the 
strings used to define the tree language do not include the leaf symbols. 

As shown in Figure [Q A and B are the DFAs used in Theorem 1 of lfl6l except that a self-loop labeled 
by an additional symbol d is added to each state in B. We use the symbol d as an identifier of DFA B, 
which always leads to a dead state in the computations of A. This will be useful for establishing that all 
vertical states of the SDTA constructed as in Lemma 14.11 are needed to recognize the concatenation of 
tree languages defined below. 

Based on the DFAs A and B we define the tree languages Ta and 7g used in our lower bound con- 
struction. The tree language 7# consists of Z-labeled trees t, £ = {a,b,c,d}, where: 

1 . All leaves are labeled by a and if a node u has a child that is a leaf, then all the children of u are 



2. B accepts the string of symbols labeling a path from any node of height one to the root. 

3. The following holds for any u € dom(?) and any nodes vi and of height one below u. If w; is 
the string of symbols labeling the path from v, to u, i = 1, 2, then B reaches the same state after 
reading strings wi and W2- 

Intuitively, the above condition means that when, on a tree of 7g, the DFA B reads strings of symbols 
labeling paths starting from nodes of height one upwards, the computations corresponding to different 
paths "agree" at each node. This property is used in the construction of an SDTA Mb for Tg below. 

Note that the computations of B above are started from the nodes of height one and they ignore the 
leaf symbols. This is done for technical reasons because in tree concatenation a leaf symbol is replaced 
by a tree, i.e., the original symbol labeling the leaf will not appear in the resulting tree. 

7g can be recognized by an SDTA Mb = (Qb, {a,b,c,d} ,8b, Fb) where Qg = {0, 1, . . . ,n — 1} and 
Fb = {n — 1}. The transition function is defined as: 



(1) S B (0,a) = e, 

(2) 8 B {i,a) = Uo<k«-i*' + . 

(3) 8 B (i,d) =Uo<Kk-i*' + > 

(4) 8 B (j,b) = (j-l) + A<j<n-lmd8 B (0,b) = (n-l)+, 

(5) 5 B (l,c) = {0,...,n-l}+. 



leaves. 



The tree language Ta and an SDTAM^ recognizing it are defined similarly based on the DFA A. Note 
that Ta has no occurrences of the symbol d and Ma has no transitions defined on d. The SDTAs Ma and 
Mb have m and n vertical states, respectively. 
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An SDTA C recognizing tree language T A ■ T B Q is obtained from Ma and Mb using the construction 
given in Lemma |4~T1 The vertical states in C are of the following form 



(q,S,p),0 < q < n,S C {0, 1, . . . ,n — 1},0 < p < m, 



(1) 



where if p = m — 1 then € 5, and if S = then q = n and p = m can not both be true. The number of 
states in (0Q) is (n + l)((m + 1)2" — 2" _1 ) — 1. State q = « (or p = m) denotes q = dead (or = dead) 
in the construction of lemma |4~T1 We will show that C needs at least (n + l)((m + 1)2" — 2" _1 ) — 1 
vertical states. We prove this by showing that each state in (Q~|) is reachable and all states are pairwise 
inequivalent, or distinguishable. Here distinguishability means that for any distinct states q\ and qi there 
exists t € Tz[x] such that the (unique deterministic) computation of C on t (x <— q\ ) leads to acceptance if 
and only if the computation of C on t(x <— 172) does not lead to acceptance. 

Lemma 4.2 All states ofC are reachable. 

Proof. We introduce the following notation. For a unary tree 

t = ai(a2{-..a m (b) ...))> we denote word{t) = a m a m -\ ...a\ 6 £*. Note that word(t) consists of the 
sequence of labels of t from the node of height one to the root, and the label of the leaf is not included. 
We show that all the states in dTJ are reachable by using induction on \S\. 

When \S\ =0, (i,0, j), < i < n — 1, < j < m — 2 is reachable from (0,0,0) by reading tree t where 
word{t) = b'a-i. State (n,0,j), 1 < j <m — 2 is reachable from (0,0,0) by reading tree a(?i,?2) where 
word{t\) = ba*~ x and word{t-i) = b 2 a^ 1 . State (n,0,O) is reachable by reading symbol b from state 
(n,®,j), 1 < j < m — 2. State (/,0,m), < i < n— 1 is reachable from (0,0,0) by reading tree b{t\,t2) 
where word{t\) = b'~ l a and word[t2) = b'~ Y a 2 . 

When |5| = 1, (/, {0},m — 1), < i < n — 1 is reachable from (0,0,0) by reading tree t where 
word{t) =b i a m - i . 

State (n, {0},m — 1), is reachable from (0,0,0) by reading tree a{t\,t2) where word{t\) = ba m ~ 2 and 
WO rd(t 2 ) =b 2 a m - 2 . 

State (/, {0}J), < i < n, < j < m — 2 is reachable from (/, {0},m — 1) by reading a sequence of 
unary symbol 

State (/, {0},m), < i < n — 1 is reachable from (0,0,0) by reading tree t where word(t) = b l a m ~ l d. 
From (0,0,0) by reading subtree b(b(a),b(b(a))), state («,0,O) is reached. State (n,{0},m) is 
reached from («,0,O) by reading a sequence of unary symbols a m ~ l d. 
That is all the states (i, {0}, j), < i < n, < j < m are reachable. 

Then state (/, {k},j), < i < n— 1, < j < m— 1, 1 < k < n— 1 is reachable from (/ — 1, {k — l},j) 
by reading a sequence of unary symbols baK For any integer x, 



State (n, {k}J), < j < m — 1, 1 < k < n— 1 is reachable from («, {& — 1},7') by reading a sequence of 
unary symbols baL State (/, {^},m), 0<j<n — 1, 1 <k<n— lis reachable from (i — l,{it— l},m) by 
reading a unary symbol ft. State («, {&},ra), l<jt<n — lis reachable from (n, {k — l},m) by reading a 
unary symbol b. 

That is all the states (/, {k}J), < i < n, < j < m, < k < n — 1 are reachable. 
'Recall from section[2]that 74 ■ consists of trees where in some tree of Jg a leaf is replaced by a tree of T^. 




n + x if x < 
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Now assume that for \S\ < z, all the states (i,S,j), < i < n, < j < m, S C {0,. . . ,n — 1} are 
reachable. And this is the inductive assumption. 

We will show that any state (x,S',y), 0<x<n,0<y<m, \S'\ =z+ 1 is reachable. 

First consider the case where y ^ m — 1. Let s\ > s 2 > ■ ■ ■ > s z > s z+ \ be the elements in S'. Let 
P = {si-s z+h s 2 -s z+h ...,s z -s z+ i}. 

When <x < n — 1, according to the inductive assumption, state (x — s z+ i,P,0), is reachable. Then 
state (x — s z+ 1 , P U {0}, m — 1) is reachable from (x — s z+ 1 , P, 0) by reading a sequence of unary symbols 
a m . State (x,5',j), < y < m — 2 is reachable from (x — s z+ i,PU {0},m — 1) by reading a sequence of 
unary symbols b s ~~ +l a y . State (x,S',m) is reachable from (x — s z+ i,PU{0},m— 1) by reading a sequence 
of unary symbols b Sz+[ d. 

When x = n, according to the inductive assumption, state (n,P,0), is reachable. Then state (n,PU 
{0},ra — 1) is reachable from («,P,0) by reading a sequence of unary symbols a m . (n,S' ,y), < 
y < m — 2 is reachable from (n,P U {0},m — 1) by reading a sequence of unary symbols b s '~ +i a y . State 
(n,S' ,m) is reachable from (n,P(J {0},m — 1) by reading a sequence of unary symbols b Sz+1 d. 

Now consider the case when y = m — 1. According to the definition of (Q]), G 5'. According to 
the inductive assumption, state (x,S' — {0},m — 2) is reachable. Then state (x,S',m — 1) is reachable by 
reading a unary symbol a. 

Since (x,S',y) is an arbitrary state with \S'\ = z+ 1, we have proved that all the states (x,S',y), 
< x < n, < y < m, \S'\ = z+ 1 is reachable. 

Thus, all the states in (OQ) are reachable. ■ 

Lemma 4.3 All states ofC are pairwise inequivalent. H 

According to the upper bound in Lemma |4~T1 and Lemmas I4.2l and l4~3l we have proved the following 
theorem. 

Theorem 4.1 For arbitrary SDTAs A { andA 2 , where A, = (&■,£, <5/,F ; ), z'= 1,2, any SDTAB = (Q,L,8,F) 
recognizing L{A 2 )-L(A X ) satisfies \Q\ < (\Qi\ + 1) x (2l Ql l x (\Q 2 \ + 1) — I— i) _ l. 

For any integers m,n> 1, there exists tree languages and Tg, such that and 7g can be recognized 
by SDTAs having m and n vertical states, respectively, and any SDTA recognizing ■ Tb needs at least 
(n + 1 ) ( (m + 1 ) 2" - 2"~ 1 ) - 1 vertical states. 

We do not have a matching lower bound for the number of horizontal states given by Lemma 14.11 
With regards to the number of vertical states, both the upper bound of Lemma 147X1 and the lower bound 
of Theorem 14.1 l ean be immediately modified for WDTAs. (The proof holds almost word for word.) In 
the case of WDTAs, getting a good lower bound for the number of horizontal states would likely be very 
hard. 

5 Conclusion 

We have studied the operational state complexity of two variants of deterministic unranked tree automata. 
For union and intersection, tight upper bounds on the number of vertical states were established for both 
strongly and weakly deterministic automata. An almost tight upper bound on the number of horizontal 
states was obtained in the case of strongly deterministic unranked tree automata. For weakly determin- 
istic automata, lower bounds on the numbers of horizontal states are hard to establish because there can 



2 Proof omitted due to length restriction. 
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be trade-offs between the numbers of vertical and horizontal states. This is indicated also by the fact that 
minimization of weakly deterministic unranked tree automata is intractable and the minimal automaton 
need not be unique [10]. 

As ordinary strings can be viewed as unary trees, it is easy to predict that the state complexity of 
a given operation for tree automata should be greater or equal to the state complexity of the corre- 
sponding operation on string languages. As our main result, we showed that for deterministic unranked 
tree automata, the state complexity of concatenation of an m state and an n state automaton is at most 
(n + l)((m + 1)2" - 2"- 1 ) - 1 and that this bound can be reached in the worst case. The bound is of a 
different order than the known state complexity ml" —2 n ~ l of concatenation of regular string languages. 
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